Recursive Adaptation of Stepsize Parameter for Non-stationary Environments
نویسنده
چکیده
In this article, we propose a method to adapt stepsize parameters used in reinforcement learning for dynamic environments. In general reinforcement learning situations, a stepsize parameter is decreased to zero during learning, because the environment is generally supposed to be noisy but stationary, such that the true expected rewards are fixed. On the other hand, we assume that in the real world, the true expected reward changes over time and hence, the learning agent must adapt the change through continuous learning. We derive the higher-order derivatives of exponential moving average (which is used to estimate the expected values of states or actions in major reinforcement learning) using stepsize parameters. We also illustrate a mechanism to calculate these derivatives in a recursive manner. Using the mechanism, we construct a precise and flexible adaptation method for the stepsize parameter in order to minimize square errors or maximize a certain criterion. The proposed method is validated both theoretically and experimentally.
منابع مشابه
مکان یابی وفقی موبایل به روش آزمون باقیمانده
Determination of mobile localization with time of arrival (TOA) signal is a requirement in cellular mobile communication. In some of the previous methods, localization with non-line-of-sight (NLOS) paths can lead to large position error. Also for simplicity, in most simulations suppose non stationary actual environments as stationary. This paper proposes (residual test + recursive least square)...
متن کاملSteady-state Performance Analysis of Bayesian Adaptive Filtering
Adaptive filtering is in principle intended for tracking nonstationary systems. However, most adaptive filtering algorithms have been designed for converging to a fixed unknown filter. When actually confronted with a non-stationary environment, they possess just one parameter (stepsize, window size) to adjust their tracking capability. In the stationary case of non-stationarity, the optimal fil...
متن کاملThe Time Adaptive Self Organizing Map for Distribution Estimation
The feature map represented by the set of weight vectors of the basic SOM (Self-Organizing Map) provides a good approximation to the input space from which the sample vectors come. But the timedecreasing learning rate and neighborhood function of the basic SOM algorithm reduce its capability to adapt weights for a varied environment. In dealing with non-stationary input distributions and changi...
متن کاملStochastic Recursive Algorithms for Networked Systems with Delay and Random Switching: Multiscale Formulations and Asymptotic Properties
Motivated by consensus control of networked systems with communication latency and randomly switching topologies, this paper studies stochastic approximation (SA) algorithms for systems with time delays and randomly switching dynamics. To accommodate realistic time delay systems, our formulation of the discrete-time systems does not impose bounds on delays when the sampling intervals become sma...
متن کاملPerformance Analysis of Bayesian Adaptive Filtering
While adaptive filtering is in principle intended for tracking non-stationary systems, most adaptive filtering algorithms have been designed for converging to a fixed unknown filter. When actually confronted with a non-stationary environment, they possess just one parameter (stepsize, forgetting factor) to adjust their tracking capability. Virtually the only existing optimal approach is the Kal...
متن کامل